PU data simulation/README.md

Positive Unlabeled data simulation

To comprehensively assess the proposed method, we designed a series of simulation studies. Four different scenarios for the noise distribution are simulated that correspond to: population balanced with two classes well separated, or clear balanced scenario; population balanced with two classes not well separated, or noisy balance scenario; population unbalanced with two classes well separated, or clear unbalanced scenario; population unbalanced with two classes not well separated, or the noisy unbalanced scenario. The alteration of population unbalancedness and separation are achieved by designing different propensity score function as follows: image

Usage

simul_data=PU_data_simulation(p=100,N=200,confident_rate=0.5,scenario='noisy_balance',valid='01')

Arguments

Value

Result list contains three elements: pred.y shows the probability for each same to be predicted as positive; cutoff is the reference cutoff to transfer continues probability to binary 0/1 label; pred.coef1 take the variable coefficient used in prediction model.

Example

### The R packages involved in PLUS package
library(PLUS)
library(glmnet)

X=PLUS::example_data$train_data
Label=PLUS::example_data$Label.obs
Prediction=PLUS(train_data=X,Label.obs=Label,Sample_use_time=30,l.rate=1,qq=0.1)

Contact Information

Ph.D. candidate, Indiana University School of Medicine

Ph.D. candidate, Department of Biostatistics, Indiana University

Reference

Zhou, J., Lu, X., Chang, W., Wan, C., Zhang, C. and Cao, S., 2020. PLUS: predicting pan-cancer metastasis potential based on positive and unlabeled learning



zcslab/PLUS documentation built on June 20, 2020, 1:01 a.m.